Data Quality Not Your Typical Database Problem
نویسنده
چکیده
Textbook database examples are often wrong and simplistic. Unfortunately Data is never born clean or pure. Errors, missing values, repeated entries, inconsistent instances and unsatisfied business rules are the norm rather than the exception. Data cleaning (also known as data cleansing, record linkage and many other terminologies) is growing as a major application requirement and an interdisciplinary research area. In this talk, we will start by discussing some of the major issues and challenges facing creating effective and efficient data cleaning solutions. Then we will discuss some challenges and criticize current conservative approaches to this very critical problem. Finally we will discuss some of our work at QCRI in this area.
منابع مشابه
Web and Information Technologies
s of the Invited Talks Towards Automated Information Factories . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Aris M. Ouksel Data Quality Not Your Typical Database Problem . . . . . . . . . . . . . . . . . . . . . .3
متن کاملData Quality – The Fuel that Drives the Business Engine
In today’s information age companies will live and die by information. This information is the fuel that drives the business engine. As more and more data is collected, the reality of a multichannel world that includes e-commerce, direct sales, call centers and existing systems sets in. Bad data is affecting companies at an alarming rate and the dilemma is clear: how can a company ensure that i...
متن کاملThe Main Steps to Data Quality
To gain knowledge out of your data, your data has to be of high quality. Bad data quality becomes more and more the problem for companies, who start to exploit their data stocks. This article will show the main obstacles on the way to perfect data quality. It is based on our experience to improve data quality in large customer or business partner databases. The examples mentioned in this paper ...
متن کاملThe sequelae of misinterpretating surgical outcome data.
On a normal working day, a plot is presented to You that will change your life forever. The plot (Fig. 1A) represents your variable life-adjusted display (VLAD) curves depicted versus VLAD curves from your ‘competing’ colleagues. This VLAD curve, a real case, a real plot (the surgeon is not an author of this letter), presents a plot of your cumulative sum of the difference in expected and obser...
متن کاملCompleteness in the Relational Model: a Comprehensive Framework
Completeness is a well known data quality dimension in the area of databases. Intuitively, a database is complete if it represents every fact of the real world coherent with the database semantics, i.e. its intension. In the paper, we provide a comprehensive framework for characterizing completeness in the relational model, investigating several different paradigms typical of database models, s...
متن کامل